Method and apparatus for compressing and decompressing a higher order ambisonics signal representation

ABSTRACT

Higher Order Ambisonics (HOA) represents a complete sound field in the vicinity of a sweet spot, independent of loudspeaker set-up. The high spatial resolution requires a high number of HOA coefficients. In the invention, dominant sound directions are estimated and the HOA signal representation is decomposed into dominant directional signals in time domain and related direction information, and an ambient component in HOA domain, followed by compression of the ambient component by reducing its order. The reduced-order ambient component is transformed to the spatial domain, and is perceptually coded together with the directional signals. At receiver side, the encoded directional signals and the order-reduced encoded ambient component are perceptually decompressed, the perceptually decompressed ambient signals are transformed to an HOA domain representation of reduced order, followed by order extension. The total HOA representation is recomposed from the directional signals, the corresponding direction information, and the original-order ambient HOA component.

The invention relates to a method and to an apparatus for compressingand decompressing a Higher Order Ambisonics signal representation,wherein directional and ambient components are processed in a differentmanner.

BACKGROUND

Higher Order Ambisonics (HOA) offers the advantage of capturing acomplete sound field in the vicinity of a specific location in the threedimensional space, which location is called ‘sweet spot’. Such HOArepresentation is independent of a specific loudspeaker set-up, incontrast to channel-based techniques like stereo or surround. But thisflexibility is at the expense of a decoding process required forplayback of the HOA representation on a particular loudspeaker set-up.

HOA is based on the description of the complex amplitudes of the airpressure for individual angular wave numbers k for positions x in thevicinity of a desired listener position, which without loss ofgenerality may be assumed to be the origin of a spherical coordinatesystem, using a truncated Spherical Harmonics (SH) expansion. Thespatial resolution of this representation improves with a growingmaximum order N of the expansion. Unfortunately, the number of expansioncoefficients O grows quadratically with the order N, i.e. O=(N+1)². Forexample, typical HOA representations using order N=4 require O=25 HOAcoefficients. Given a desired sampling rate f_(s) and the number N_(b)of bits per sample, the total bit rate for the transmission of an HOAsignal representation is determined by O·f_(s)·N_(b), and transmissionof an HOA signal representation of order N=4 with a sampling rate off_(s)=48 kHz employing N_(b)=16 bits per sample is resulting in a bitrate of 19.2 MBits/s. Thus, compression of HOA signal representations ishighly desirable.

An overview of existing spatial audio compression approaches can befound in patent application EP 10306472.1 or in I. Elfitri, B. Günel, A.M. Kondoz, “Multichannel Audio Coding Based on Analysis by Synthesis”,Proceedings of the IEEE, vol. 99, no. 4, pp. 657-670, April 2011.

The following techniques are more relevant with respect to theinvention.

B-format signals, which are equivalent to Ambisonics representations offirst order, can be compressed using Directional Audio Coding (DirAC) asdescribed in V. Pulkki, “Spatial Sound Reproduction with DirectionalAudio Coding”, Journal of Audio Eng. Society, vol. 55(6), pp. 503-516,2007. In one version proposed for teleconference applications, theB-format signal is coded into a single omni-directional signal as wellas side information in the form of a single direction and a diffusenessparameter per frequency band. However, the resulting drastic reductionof the data rate comes at the price of a minor signal quality obtainedat reproduction. Further, DirAC is limited to the compression ofAmbisonics representations of first order, which suffer from a very lowspatial resolution.

The known methods for compression of HOA representations with N>1 arequite rare. One of them performs direct encoding of individual HOAcoefficient sequences employing the perceptual Advanced Audio Coding(AAC) codec, c.f. E. Hellerud, I. Burnett, A. Solvang, U. PeterSvensson, “Encoding Higher Order Ambisonics with AAC”, 124th AESConvention, Amsterdam, 2008. However, the inherent problem with suchapproach is the perceptual coding of signals that are never listened to.The reconstructed playback signals are usually obtained by a weightedsum of the HOA coefficient sequences. That is why there is a highprobability for the unmasking of perceptual coding noise when thedecompressed HOA representation is rendered on a particular loudspeakerset-up. In more technical terms, the major problem for perceptual codingnoise unmasking is the high cross-correlations between the individualHOA coefficients sequences. Because the coded noise signals in theindividual HOA coefficient sequences are usually uncorrelated with eachother, there may occur a constructive superposition of the perceptualcoding noise while at the same time the noise-free HOA coefficientsequences are cancelled at superposition. A further problem is that thementioned cross correlations lead to a reduced efficiency of theperceptual coders.

In order to minimise the extent these effects, it is proposed in EP10306472.1 to transform the HOA representation to an equivalentrepresentation in the spatial domain before perceptual coding. Thespatial domain signals correspond to conventional directional signals,and would correspond to the loudspeaker signals if the loudspeakers werepositioned in exactly the same directions as those assumed for thespatial domain transform.

The transform to spatial domain reduces the cross-correlations betweenthe individual spatial domain signals. However, the cross-correlationsare not completely eliminated. An example for relatively highcross-correlations is a directional signal, whose direction fallsin-between the adjacent directions covered by the spatial domainsignals.

A further disadvantage of EP 10306472.1 and the above-mentioned Hellerudet al. article is that the number of perceptually coded signals is(N+1)², where N is the order of the HOA representation. Therefore thedata rate for the compressed HOA representation is growing quadraticallywith the Ambisonics order.

The inventive compression processing performs a decomposition of an HOAsound field representation into a directional component and an ambientcomponent. In particular for the computation of the directional soundfield component a new processing is described below for the estimationof several dominant sound directions.

Regarding existing methods for direction estimation based on Ambisonics,the above-mentioned Pulkki article describes one method in connectionwith DirAC coding for the estimation of the direction, based on theB-format sound field representation. The direction is obtained from theaverage intensity vector, which points to the direction of flow of thesound field energy. An alternative based on the B-format is proposed inD. Levin, S. Gannot, E. A. P. Habets, “Direction-of-Arrival Estimationusing Acoustic Vector Sensors in the Presence of Noise”, IEEE Proc. ofthe ICASSP, pp. 105-108, 2011. The direction estimation is performediteratively by searching for that direction which provides the maximumpower of a beam former output signal steered into that direction.

However, both approaches are constrained to the B-format for thedirection estimation, which suffers from a relatively low spatialresolution. An additional disadvantage is that the estimation isrestricted to only a single dominant direction.

HOA representations offer an improved spatial resolution and thus allowan improved estimation of several dominant directions. The existingmethods performing an estimation of several directions based on HOAsound field representations are quite rare. An approach based oncompressive sensing is proposed in N. Epain, C. Jin, A. van Schaik, “TheApplication of Compressive Sampling to the Analysis and Synthesis ofSpatial Sound Fields”, 127th Convention of the Audio Eng. Soc., NewYork, 2009, and in A. Wabnitz, N. Epain, A. van Schaik, C Jin, “TimeDomain Reconstruction of Spatial Sound Fields Using Compressed Sensing”,IEEE Proc. of the ICASSP, pp. 465-468, 2011. The main idea is to assumethe sound field to be spatially sparse, i.e. to consist of only a smallnumber of directional signals. Following allocation of a high number oftest directions on the sphere, an optimisation algorithm is employed inorder to find as few test directions as possible together with thecorresponding directional signals, such that they are well described bythe given HOA representation. This method provides an improved spatialresolution compared to that which is actually provided by the given HOArepresentation, since it circumvents the spatial dispersion resultingfrom a limited order of the given HOA representation. However, theperformance of the algorithm heavily depends on whether the sparsityassumption is satisfied. In particular, the approach fails if the soundfield contains any minor additional ambient components, or if the HOArepresentation is affected by noise which will occur when it is computedfrom multi-channel recordings.

A further, rather intuitive method is to transform the given HOArepresentation to the spatial domain as described in B. Rafaely,“Plane-wave decomposition of the sound field on a sphere by sphericalconvolution”, J. Acoust. Soc. Am., vol. 4, no. 116, pp. 2149-2157,October 2004, and then to search for maxima in the directional powers.The disadvantage of this approach is that the presence of ambientcomponents leads to a blurring of the directional power distribution andto a displacement of the maxima of the directional powers compared tothe absence of any ambient component.

Invention

A problem to be solved by the invention is to provide a compression forHOA signals whereby the high spatial resolution of the HOA signalrepresentation is still kept. This problem is solved by the methodsdisclosed in claims 1 and 2. Apparatuses that utilise these methods aredisclosed in claims 3 and 4.

The invention addresses the compression of Higher Order Ambisonics HOArepresentations of sound fields. In this application, the term ‘HOA’denotes the Higher Order Ambisonics representation as such as well as acorrespondingly encoded or represented audio signal. Dominant sounddirections are estimated and the HOA signal representation is decomposedinto a number of dominant directional signals in time domain and relateddirection information, and an ambient component in HOA domain, followedby compression of the ambient component by reducing its order. Afterthat decomposition, the ambient HOA component of reduced order istransformed to the spatial domain, and is perceptually coded togetherwith the directional signals.

At receiver or decoder side, the encoded directional signals and theorder-reduced encoded ambient component are perceptually decompressed.The perceptually decompressed ambient signals are transformed to an HOAdomain representation of reduced order, followed by order extension. Thetotal HOA representation is re-composed from the directional signals andthe corresponding direction information and from the original-orderambient HOA component.

Advantageously, the ambient sound field component can be representedwith sufficient accuracy by an HOA representation having a lower thanoriginal order, and the extraction of the dominant directional signalsensures that, following compression and decompression, a high spatialresolution is still achieved.

In principle, the inventive method is suited for compressing a HigherOrder Ambisonics HOA signal representation, said method including thesteps:

-   -   estimating dominant directions, wherein said dominant direction        estimation is dependent on a directional power distribution of        the energetically dominant HOA components;    -   decomposing or decoding the HOA signal representation into a        number of dominant directional signals in time domain and        related direction information, and a residual ambient component        in HOA domain, wherein said residual ambient component        represents the difference between said HOA signal representation        and a representation of said dominant directional signals;    -   compressing said residual ambient component by reducing its        order as compared to its original order;    -   transforming said residual ambient HOA component of reduced        order to the spatial domain;    -   perceptually encoding said dominant directional signals and said        transformed residual ambient HOA component.

In principle, the inventive method is suited for decompressing a HigherOrder Ambisonics HOA signal representation that was compressed by thesteps:

-   -   estimating dominant directions, wherein said dominant direction        estimation is dependent on a directional power distribution of        the energetically dominant HOA components;    -   decomposing or decoding the HOA signal representation into a        number of dominant directional signals in time domain and        related direction information, and a residual ambient component        in HOA domain, wherein said residual ambient component        represents the difference between said HOA signal representation        and a representation of said dominant directional signals;    -   compressing said residual ambient component by reducing its        order as compared to its original order;    -   transforming said residual ambient HOA component of reduced        order to the spatial domain;    -   perceptually encoding said dominant directional signals and said        transformed residual ambient HOA component, said method        including the steps:    -   perceptually decoding said perceptually encoded dominant        directional signals and said perceptually encoded transformed        residual ambient HOA component;    -   inverse transforming said perceptually decoded transformed        residual ambient HOA component so as to get an HOA domain        representation;    -   performing an order extension of said inverse transformed        residual ambient HOA component so as to establish an        original-order ambient HOA component;    -   composing said perceptually decoded dominant directional        signals, said direction information and said original-order        extended ambient HOA component so as to get an HOA signal        representation.

In principle the inventive apparatus is suited for compressing a HigherOrder Ambisonics HOA signal representation, said apparatus including:

-   -   means being adapted for estimating dominant directions, wherein        said dominant direction estimation is dependent on a directional        power distribution of the energetically dominant HOA components;    -   means being adapted for decomposing or decoding the HOA signal        representation into a number of dominant directional signals in        time domain and related direction information, and a residual        ambient component in HOA domain, wherein said residual ambient        component represents the difference between said HOA signal        representation and a representation of said dominant directional        signals;    -   means being adapted for compressing said residual ambient        component by reducing its order as compared to its original        order;    -   means being adapted for transforming said residual ambient HOA        component of reduced order to the spatial domain;    -   means being adapted for perceptually encoding said dominant        directional signals and said transformed residual ambient HOA        component.

In principle the inventive apparatus is suited for decompressing aHigher Order Ambisonics HOA signal representation that was compressed bythe steps:

-   -   estimating dominant directions, wherein said dominant direction        estimation is dependent on a directional power distribution of        the energetically dominant HOA components;    -   decomposing or decoding the HOA signal representation into a        number of dominant directional signals in time domain and        related direction information, and a residual ambient component        in HOA domain, wherein said residual ambient component        represents the difference between said HOA signal representation        and a representation of said dominant directional signals;    -   compressing said residual ambient component by reducing its        order as compared to its original order;    -   transforming said residual ambient HOA component of reduced        order to the spatial domain;    -   perceptually encoding said dominant directional signals and said        transformed residual ambient HOA component, said apparatus        including:    -   means being adapted for perceptually decoding said perceptually        encoded dominant directional signals and said perceptually        encoded transformed residual ambient HOA component;    -   means being adapted for inverse transforming said perceptually        decoded transformed residual ambient HOA component so as to get        an HOA domain representation;    -   means being adapted for performing an order extension of said        inverse transformed residual ambient HOA component so as to        establish an original-order ambient HOA component;    -   means being adapted for composing said perceptually decoded        dominant directional signals, said direction information and        said original-order extended ambient HOA component so as to get        an HOA signal representation.

Advantageous additional embodiments of the invention are disclosed inthe respective dependent claims.

DRAWINGS

Exemplary embodiments of the invention are described with reference tothe accompanying drawings, which show in:

FIG. 1 Normalised dispersion function v_(N)(Θ) for different Ambisonicsorders N and for angles Θ∈[0,π];

FIG. 2 block diagram of the compression processing according to theinvention;

FIG. 3 block diagram of the decompression processing according to theinvention.

EXEMPLARY EMBODIMENTS

Ambisonics signals describe sound fields within source-free areas usingSpherical Harmonics (SH) expansion. The feasibility of this descriptioncan be attributed to the physical property that the temporal and spatialbehaviour of the sound pressure is essentially determined by the waveequation.

Wave Equation and Spherical Harmonics Expansion

For a more detailed description of Ambisonics, in the following aspherical coordinate system is assumed, where a point in spacex=(r,θ,φ)^(T) is represented by a radius r>0 (i.e. the distance to thecoordinate origin), an inclination angle θ∈[0,π] measured from the polaraxis z, and an azimuth angle φ∈[0,π] measured in the x=y plane from thex axis. In this spherical coordinate system the wave equation for thesound pressure p(t,x) within a connected source-free area, where tdenotes time, is given by the textbook of Earl G. Williams, “FourierAcoustics”, vol. 93 of Applied Mathematical Sciences, Academic Press,1999:

$\begin{matrix}{{{\frac{1}{r^{2}}\begin{bmatrix}\begin{matrix}{{\frac{\partial\;}{\partial r}\left( {r^{2}\frac{\partial{p\left( {t,x} \right)}}{\partial r}} \right)} +} \\{{\frac{1}{\sin \; \theta}\frac{\partial\;}{\partial\theta}\left( {\sin \; \theta \frac{\partial{p\left( {t,x} \right)}}{\partial\theta}} \right)} +}\end{matrix} \\{\frac{1}{\sin^{2}\theta}\frac{\partial^{2}{p\left( {t,x} \right)}}{\partial\varphi^{2}}}\end{bmatrix}} - {\frac{1}{c_{s}^{2}}\frac{\partial^{2}{p\left( {t,x} \right)}}{\partial t^{2}}}} = 0} & (1)\end{matrix}$

with c_(s) indicating the speed of sound. As a consequence, the Fouriertransform of the sound pressure with respect to time

$\begin{matrix}{{p\left( {\omega,x} \right)}:={\mathcal{F}_{t}\left\{ {p\left( {t,x} \right)} \right\}}} & (2) \\{{:={\int_{- \infty}^{\infty}{{p\left( {t,x} \right)}^{{- }\; \omega \; t}\ {t}}}},} & (3)\end{matrix}$

where i denotes the imaginary unit, may be expanded into the series ofSH according to the Williams textbook:

P(kc _(s),(r,θ,φ)^(T))=Σ_(n=0) ^(∞)Σ_(m=−n) ^(n) p _(n) ^(m)(kr)Y _(n)^(m)(θ,φ).  (4)

It should be noted that this expansion is valid for all points x withina connected source-free area, which corresponds to the region ofconvergence of the series.

In eq. (4), k denotes the angular wave number defined by

$\begin{matrix}{k:=\frac{\omega}{c_{s}}} & (5)\end{matrix}$

and p_(n) ^(m)(kr) indicates the SH expansion coefficients, which dependonly on the product kr.

Further, Y_(n) ^(m)(ƒ,φ) are the SH functions of order n and degree m:

$\begin{matrix}{{{Y_{n}^{m}\left( {\theta,\varphi} \right)}:={\sqrt{\frac{\left( {{2\; n} + 1} \right)}{4\; \pi}\frac{\left( {n - m} \right)!}{\left( {n + m} \right)!}}{P_{n}^{m}\left( {\cos \; \theta} \right)}^{\; m\; \varphi}}},} & (6)\end{matrix}$

where P_(n) ^(m)(cos θ) denote the associated Legendre functions and(•)! indicates the factorial.

The associated Legendre functions for non-negative degree indices m aredefined through the Legendre polynomials P_(n)(x) by

$\begin{matrix}{{P_{n}^{m}(x)}:={{\left( {- 1} \right)^{m}\left( {1 - x^{2}} \right)^{\frac{m}{2}}\frac{^{m}}{x^{m}}{P_{n}(x)}\mspace{14mu} {for}\mspace{14mu} m} \geq 0.}} & (7)\end{matrix}$

For negative degree indices, i.e. m<0, the associated Legendre functionsare defined by

$\begin{matrix}{{P_{n}^{m}(x)}:={{\left( {- 1} \right)^{m}\frac{\left( {n + m} \right)!}{\left( {n - m} \right)!}{P_{n}^{- m}(x)}\mspace{14mu} {for}\mspace{14mu} m} < 0.}} & (8)\end{matrix}$

The Legendre polynomials P_(n)(x) (n≧0) in turn can be defined using theRodrigues' Formula as

$\begin{matrix}{{P_{n}(x)} = {\frac{1}{2^{n}{n!}}\frac{^{n}}{x^{n}}{\left( {x^{2} - 1} \right)^{n}.}}} & (9)\end{matrix}$

In the prior art, e.g. in M. Poletti, “Unified Description of Ambisonicsusing Real and Complex Spherical Harmonics”, Proceedings of theAmbisonics Symposium 2009, 25-27 Jun. 2009, Graz, Austria, there alsoexist definitions of the SH functions which deviate from that in eq. (6)by a factor of (−1)^(m) for negative degree indices m.

Alternatively, the Fourier transform of the sound pressure with respectto time can be expressed using real SH functions S_(n) ^(m)(θ,φ) as

P(kc _(s),(r,θ,φ)^(T))=Σ_(n=0) ^(∞)Σ_(m=−n) ^(n) g _(n) ^(m)(kr)S _(n)^(m)(θ,φ).  (10)

In literature, there exist various definitions of the real SH functions(see e.g. the above-mentioned Poletti article). One possible definition,which is applied throughout this document, is given by

$\begin{matrix}{{S_{n}^{m}\left( {\theta,\varphi} \right)}:=\left( {\begin{matrix}{\frac{\left( {- 1} \right)^{m}}{\sqrt{2}}\left\lbrack {{Y_{n}^{m}\left( {\theta,\varphi} \right)} + {Y_{n}^{m*}\left( {\theta,\varphi} \right)}} \right\rbrack} & {{{for}\mspace{14mu} m} > 0} \\{Y_{n}^{m}\left( {\theta,\varphi} \right)} & {{{for}\mspace{14mu} m} = 0} \\{\frac{\left( {- 1} \right)}{\sqrt{2}}\left\lbrack {{Y_{n}^{m}\left( {\theta,\varphi} \right)} - {Y_{n}^{m*}\left( {\theta,\varphi} \right)}} \right\rbrack} & {{{for}\mspace{14mu} m} < 0}\end{matrix},} \right.} & (11)\end{matrix}$

where (•)* denotes complex conjugation. An alternative expression isobtained by inserting eq. (6) into eq. (11):

$\begin{matrix}{{{S_{n}^{m}\left( {\theta,\varphi} \right)} = {\sqrt{\frac{\left( {{2\; n} + 1} \right)}{4\; \pi}\frac{\left( {n - m} \right)!}{\left( {n + m} \right)!}}{P_{n}^{m}\left( {\cos \; \theta} \right)}{{trg}_{m}(\varphi)}}},{with}} & (12) \\{{{trg}_{m}(\varphi)}:=\left( {\begin{matrix}{\left( {- 1} \right)^{m}\sqrt{2}{\cos \left( {m\; \varphi} \right)}} & {{{for}\mspace{14mu} m} > 0} \\1 & {{{for}\mspace{14mu} m} = 0} \\{{- \sqrt{2}}{\sin \left( {m\; \varphi} \right)}} & {{{for}\mspace{14mu} m} < 0}\end{matrix},} \right.} & (13)\end{matrix}$

Although the real SH functions are real-valued per definition, this doesnot hold for the corresponding expansion coefficients q_(n) ^(m)(kr) ingeneral.

The complex SH functions are related to the real SH functions asfollows:

$\begin{matrix}{{Y_{n}^{m}\left( {\theta,\varphi} \right)} = \left( {\begin{matrix}{\frac{q_{n}^{m}({kr})}{\sqrt{2}}\left\lbrack {{S_{n}^{m}\left( {\theta,\varphi} \right)} + {\; {S_{n}^{- m}\left( {\theta,\varphi} \right)}}} \right\rbrack} & {{{for}\mspace{14mu} m} > 0} \\{S_{n}^{0}\left( {\theta,\varphi} \right)} & {{{for}\mspace{14mu} m} = 0} \\{\frac{1}{\; \sqrt{2}}\left\lbrack {{S_{n}^{m}\left( {\theta,\varphi} \right)} + {\; {S_{n}^{- m}\left( {\theta,\varphi} \right)}}} \right\rbrack} & {{{for}\mspace{14mu} m} < 0}\end{matrix}.} \right.} & (14)\end{matrix}$

The complex SH functions Y_(n) ^(m)(θ,φ) as well as the real SHfunctions S_(n) ^(m)(θ,φ) with the direction vector Ω:=(θ,φ)^(T) form anorthonormal basis for squared integrable complex valued functions on theunit sphere S² in the three-dimensional space, and thus obey theconditions

$\begin{matrix}{{\int_{^{2}}^{\;}{{Y_{n}^{m}(\Omega)}{Y_{n^{\prime}}^{m^{\prime}*}(\Omega)}\ {\Omega}}} = {{\int_{0}^{2\; \pi}{\int_{0}^{\pi}{{Y_{n}^{m}\left( {\theta,\varphi} \right)}{Y_{n^{\prime}}^{m^{\prime}*}\left( {\theta,\varphi} \right)}\sin \; \theta \ {\theta}\ {\varphi}}}} = {\delta_{n - {n\; \prime}}\delta_{m - {m\; \prime}}}}} & (15) \\{\mspace{79mu} {{{\int_{^{2}}^{\;}{{S_{n}^{m}(\Omega)}{S_{n\; \prime}^{m\; \prime}(\Omega)}\ {\Omega}}} = {\delta_{n - {n\; \prime}}\delta_{m - {m\; \prime}}}},}} & (16)\end{matrix}$

where δ denotes the Kronecker delta function. The second result can bederived using eq. (15) and the definition of the real sphericalharmonics in eq. (11).

Interior Problem and Ambisonics Coefficients

The purpose of Ambisonics is a representation of a sound field in thevicinity of the coordinate origin. Without loss of generality, thisregion of interest is here assumed to be a ball of radius R centred inthe coordinate origin, which is specified by the set {x|0≦r≦R}. Acrucial assumption for the representation is that this ball is supposedto not contain any sound sources. Finding the representation of thesound field within this ball is termed the ‘interior problem’, cf. theabove-mentioned Williams textbook.

It can be shown that for the interior problem the SH functions expansioncoefficients p_(n) ^(m)(kr) can be expressed as

p _(n) ^(m)(kr)=a _(n) ^(m)(k)j _(n)(kr),  (17)

where j_(n)(.) denote the spherical Bessel functions of first order.From eq. (17) it follows that the complete information about the soundfield is contained in the coefficients a_(n) ^(m)(k), which are referredto as Ambisonics coefficients.

Similarly, the coefficients of the real SH functions expansion q_(n)^(m)(kr) can be factorised as

q _(n) ^(m)(kr)=b _(n) ^(m)(k)j _(n)(kr),  (18)

where the coefficients b_(n) ^(m)(k) are referred to as Ambisonicscoefficients with respect to the expansion using real-valued SHfunctions. They are related to a_(n) ^(m)(k) through

$\begin{matrix}{{b_{n}^{m}(k)} = \left( {\begin{matrix}{\frac{1}{\sqrt{2}}\left\lbrack {{\left( {- 1} \right)^{m}{a_{n}^{m}(k)}} + {a_{n}^{- m}(k)}} \right\rbrack} & {{{for}\mspace{14mu} m} > 0} \\{a_{n}^{0}(k)} & {{{for}\mspace{14mu} m} = 0} \\{\frac{1}{\sqrt{2}}\left\lbrack {{a_{n}^{m}(k)} - {\left( {- 1} \right)^{m}{a_{n}^{- m}(k)}}} \right\rbrack} & {{{for}\mspace{14mu} m} < 0}\end{matrix}.} \right.} & (19)\end{matrix}$

Plane Wave Decomposition

The sound field within a sound source-free ball centred in thecoordinate origin can be expressed by a superposition of an infinitenumber of plane waves of different angular wave numbers k, impinging onthe ball from all possible directions, cf. the above-mentioned Rafaely“Plane-wave decomposition . . . ” article. Assuming that the complexamplitude of a plane wave with angular wave number k from the directionΩ₀ is given by D(k,Ω₀), it can be shown in a similar way by using eq.(11) and eq. (19) that the corresponding Ambisonics coefficients withrespect to the real SH functions expansion are given by

b _(n,plane wave) ^(m)(k;Ω ₀)=4πi ^(n) D(k,Ω ₀)S _(n) ^(m)(Ω₀).  (20)

Consequently, the Ambisonics coefficients for the sound field resultingfrom a superposition of an infinite number of plane waves of angularwave number k are obtained from an integration of eq. (20) over allpossible directions Ω₀∈S²:

$\begin{matrix}\begin{matrix}{{b_{n}^{m}(k)} = {\int_{^{2}}^{\;}{{b_{n,{{plane}\mspace{11mu} {wave}}}^{m}\left( {k;\Omega_{0}} \right)}\ {\Omega_{0}}}}} \\{= {4\; \pi \; i^{n}{\int_{^{2}}^{\;}{{D\left( {k,\Omega_{0}} \right)}{S_{n}^{m}\left( \Omega_{0} \right)}\ {{\Omega_{0}}.(22)}}}}}\end{matrix} & (21)\end{matrix}$

The function D(k,Ω) is termed ‘amplitude density’ and is assumed to besquare integrable on the unit sphere S². It can be expanded into theseries of real SH functions as

D(k,Ω)=Σ_(n=0) ^(∞)Σ_(m=−n) ^(n) c _(n) ^(m)(k)S _(n) ^(m)(Ω),  (23)

where the expansion coefficients c_(n) ^(m)(k) are equal to the integraloccurring in eq. (22), i.e.

c _(n) ^(m)(k)=∫_(S) ₂ D(k,Ω)S _(n) ^(m)(Ω)dΩ.  (24)

By inserting eq. (24) into eq. (22) it can be seen that the Ambisonicscoefficients b_(n) ^(m)(k) are a scaled version of the expansioncoefficients c_(n) ^(m)(k), i.e.

b _(n) ^(m)(k)=4πi ^(n) c _(n) ^(m)(k).  (25)

When applying the inverse Fourier transform with respect to time to thescaled Ambisonics coefficients c_(n) ^(m)(k) and to the amplitudedensity function D(k,Ω), the corresponding time domain quantities

$\begin{matrix}{{{\overset{\sim}{c}}_{n}^{m}(t)}:={{\mathcal{F}_{t}^{- 1}\left\{ {c_{n}^{m}\left( \frac{\omega}{c_{s}} \right)} \right\}} = {\frac{1}{2\; \pi}{\int_{- \infty}^{\infty}{{c_{n}^{m}\left( \frac{\omega}{c_{s}} \right)}^{\; \omega \; t}\ {\omega}}}}}} & (26) \\{{d\left( {t,\Omega} \right)}:={{\mathcal{F}_{t}^{- 1}\left\{ {D\left( {\frac{\omega}{c_{s}},\Omega} \right)} \right\}} = {\frac{1}{2\; \pi}{\int_{- \infty}^{\infty}{{D\left( {\frac{\omega}{c_{s}},\Omega} \right)}^{\; \omega \; t}\ {\omega}}}}}} & (27)\end{matrix}$

are obtained. Then, in the time domain, eq. (24) can be formulated as

{tilde over (c)} _(n) ^(m)(t)=∫_(S) ₂ d(t,Ω)S _(n) ^(m)(Ω)dΩ.  (28)

The time domain directional signal d(t,Ω) may be represented by a realSH function expansion according to

d(t,Ω)=Σ_(n=0) ^(∞)Σ_(m=−n) ^(n) {tilde over (c)} _(n) ^(m)(t)S _(n)^(m)(Ω).  (29)

Using the fact that the SH functions S_(n) ^(m)(Ω) are real-valued, itscomplex conjugate can be expressed by

d*(t,Ω)=Σ_(n=0) ^(∞)Σ_(m=−n) ^(n) {tilde over (c)} _(n) ^(m)*(t)S _(n)^(m)(Ω).  (30)

Assuming the time domain signal d(t,Ω) to be real-valued, i.e.d(t,Ω)=d*(t,Ω), it follows from the comparison of eq. (29) with eq. (30)that the coefficients {tilde over (c)}_(n) ^(m)*(t) are real-valued inthat case, i.e. {tilde over (c)}_(n) ^(m)(t)={tilde over (c)}_(n)^(m)*(t).

The coefficients {tilde over (c)}_(n) ^(m)(t) will be referred to asscaled time domain Ambisonics coefficients in the following.

In the following it is also assumed that the sound field representationis given by these coefficients, which will be described in more detailin the below section dealing with the compression.

It is noted that the time domain HOA representation by the coefficients{tilde over (c)}_(n) ^(m)(t) used for the processing according to theinvention is equivalent to a corresponding frequency domain HOArepresentation c_(n) ^(m)(k). Therefore the described compression anddecompression can be equivalently realised in the frequency domain withminor respective modifications of the equations.

Spatial Resolution with Finite Order

In practice the sound field in the vicinity of the coordinate origin isdescribed using only a finite number of Ambisonics coefficients c_(n)^(m)(k) of order n≦N. Computing the amplitude density function from thetruncated series of SH functions according to

D _(N)(k,Ω):=Σ_(n=0) ^(N)Σ_(m=−n) ^(n) c _(n) ^(m)(k)S _(n)^(m)(Ω)  (31)

introduces a kind of spatial dispersion compared to the true amplitudedensity function D(k,Ω), cf. the above-mentioned “Plane-wavedecomposition . . . ” article. This can be realised by computing theamplitude density function for a single plane wave from the direction Ω₀using eq. (31):

$\begin{matrix}\begin{matrix}{{D_{N}\left( {k,\Omega} \right)} = {`{\sum\limits_{n = 0}^{N}\; {\sum\limits_{m = {- n}}^{n}\; {{\frac{1}{4\; \pi \; i^{n}n} \cdot {b_{n,{{plane}\mspace{11mu} {wave}}}^{m}\left( {k;\Omega_{0}} \right)}}{S_{n}^{m}(\Omega)}}}}}} \\{= {{D\left( {k,\Omega_{0}} \right)}{\sum\limits_{n = 0}^{N}\; {\sum\limits_{m = {- n}}^{n}\; {{S_{n}^{m}\left( \Omega_{0} \right)}{S_{n}^{m}(\Omega)}(33)}}}}} \\{= {{D\left( {k,\Omega_{0}} \right)}{\sum\limits_{n = 0}^{N}\; {\sum\limits_{m = {- n}}^{n}\; {{Y_{n}^{m*}\left( \Omega_{0} \right)}{Y_{n}^{m}(\Omega)}(34)}}}}} \\{= {{D\left( {k,\Omega_{0}} \right)}{\sum\limits_{n = 0}^{N}\; {\frac{{2\; n} + 1}{4\; \pi}{P_{n}\left( {\cos \; \Theta} \right)}(35)}}}} \\{= {{{D\left( {k,\Omega_{0}} \right)}\left\lbrack {\frac{N + 1}{4\; {\pi \left( {{\cos \; \Theta} - 1} \right)}}\begin{pmatrix}{{P_{N + 1}\left( {\cos \; \Theta} \right)} -} \\{P_{N}\left( {\cos \; \Theta} \right)}\end{pmatrix}} \right\rbrack}\mspace{14mu} (36)}} \\{= {{D\left( {k,\Omega_{0}} \right)}{v_{N}(\Theta)}(37)}}\end{matrix} & (32) \\{with} & \; \\{{{v_{N}(\Theta)}:={\frac{N + 1}{4\; {\pi \left( {{\cos \; \Theta} - 1} \right)}}\left( {{P_{N + 1}\left( {\cos \; \Theta} \right)} - {P_{N}\left( {\cos \; \Theta} \right)}} \right)}},} & (38)\end{matrix}$

where Θ denotes the angle between the two vectors pointing towards thedirections Ω and Ω₀ satisfying the property

cos Θ=cos θ cos θ₀+cos(φ−φ₀)sin θ sin θ₀.  (39)

In eq. (34) the Ambisonics coefficients for a plane wave given in eq.(20) are employed, while in equations (35) and (36) some mathematicaltheorems are exploited, cf. the above-mentioned “Plane-wavedecomposition . . . ” article. The property in eq. (33) can be shownusing eq. (14).

Comparing eq. (37) to the true amplitude density function

$\begin{matrix}{{{D\left( {k,\Omega} \right)} = {{D\left( {k,\Omega_{0}} \right)}\frac{\delta (\Theta)}{2\; \pi}}},} & (40)\end{matrix}$

where δ(•) denotes the Dirac delta function, the spatial dispersionbecomes obvious from the replacement of the scaled Dirac delta functionby the dispersion function v_(N)(Θ) which, after having been normalisedby its maximum value, is illustrated in FIG. 1 for different Ambisonicsorders N and angles Θ∈[0,π].

Because the first zero of V_(N)(0)is located approximately at

$\frac{\pi}{N}$

for N≧4 (see the above-mentioned “Plane-wave decomposition . . . ”article), the dispersion effect is reduced (and thus the spatialresolution is improved) with increasing Ambisonics order N.

For N→∞ the dispersion function v_(N)(Θ) converges to the scaled Diracdelta function. This can be seen if the completeness relation for theLegendre polynomials

$\begin{matrix}{{\sum\limits_{n = 0}^{\infty}\; {\frac{{2\; n} + 1}{2}{P_{n}(x)}{P_{n}\left( x^{\prime} \right)}}} = {\delta \left( {x - x^{\prime}} \right)}} & (41)\end{matrix}$

is used together with eq. (35) to express the limit of v_(N)(Θ) for N→∞as

$\begin{matrix}{{\lim\limits_{N\rightarrow\infty}{v_{N}(\Theta)}} = {\frac{1}{2\pi}{\sum\limits_{n = 0}^{\infty}{\frac{{2n} + 1}{2}{P_{n}\left( {\cos \; \Theta} \right)}}}}} & {{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}(42)} \\{= {\frac{1}{2\pi}{\sum\limits_{n = 0}^{\infty}{\frac{{2n} + 1}{2}{P_{n}\left( {\cos \; \Theta} \right)}{P_{n}(1)}}}}} & {(43)} \\{= {\frac{1}{2\pi}{\delta \left( {{\cos \; \Theta} - 1} \right)}}} & {(44)} \\{= {\frac{1}{2\pi}{{\delta (\Theta)}.}}} & {(45)}\end{matrix}$

When defining the vector of real SH functions of order n≦N by

S(Ω):=(S ₀ ⁰(Ω),S ₁ ⁻¹(Ω),S ₁ ⁰(Ω),S ₁ ¹(Ω),S ₁ ⁻²(Ω),S _(N)^(N)(Ω))^(T)∈

⁰,  (46)

where 0=(N+1)² and where (.)^(T) denotes transposition, the comparisonof eq. (37) with eq. (33) shows that the dispersion function can beexpressed through the scalar product of two real SH vectors as

v _(N)(Θ)=S ^(T)(Ω)S(Ω₀).  (47)

The dispersion can be equivalently expressed in time domain as

$\begin{matrix}{{d_{N}\left( {t,\Omega} \right)}:={\sum\limits_{n = 0}^{N}{\sum\limits_{m = {- n}}^{n}{{{\overset{\sim}{c}}_{n}^{m}(t)}{S_{n}^{m}(\Omega)}}}}} & {{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}(48)} \\{= {{d\left( {t,\Omega_{0}} \right)}{{v_{N}(\Theta)}.}}} & {(49)}\end{matrix}$

Sampling

For some applications it is desirable to determine the scaled timedomain Ambisonics coefficients {tilde over (c)}_(n) ^(m)(t) from thesamples of the time domain amplitude density function d(t,Ω) at a finitenumber J of discrete directions Ω_(j). The integral in eq. (28) is thenapproximated by a finite sum according to B. Rafaely, “Analysis andDesign of Spherical Microphone Arrays”, IEEE Transactions on Speech andAudio Processing, vol. 13, no. 1, pp. 135-143, January 2005:

{tilde over (c)} _(n) ^(m)(t)≈Σ_(j=1) ^(J) g _(j)·(t,Ω _(j))S _(n)^(m)(Ω_(j)),  (50)

where the g_(j) denote some appropriately chosen sampling weights. Incontrast to the “Analysis and Design . . . ” article, approximation (50)refers to a time domain representation using real SH functions ratherthan to a frequency domain representation using complex SH functions. Anecessary condition for approximation (50) to become exact is that theamplitude density is of limited harmonic order N, meaning that

{tilde over (c)} _(n) ^(m)(t)=0 for n>N.  (51)

If this condition is not met, approximation (50) suffers from spatialaliasing errors, cf. B. Rafaely, “Spatial Aliasing in SphericalMicrophone Arrays”, IEEE Transactions on Signal Processing, vol. 55, no.3, pp. 1003-1010, March 2007. A second necessary condition requires thesampling points Ω_(j) and the corresponding weights to fulfil thecorresponding conditions given in the “Analysis and Design . . . ”article:

Σ_(j=1) ^(J) g _(j) S _(n′) ^(m′)(Ω_(j))S _(n)^(m)(Ω_(j))=δ_(n-n′)δ_(m-m′) for m,m′≦N.  (52)

The conditions (51) and (52) jointly are sufficient for exact sampling.

The sampling condition (52) consists of a set of linear equations, whichcan be formulated compactly using a single matrix equation as

ΨGΨ ^(H) =I,  (53)

where ΨP indicates the mode matrix defined by

Ψ=[S(Ω₁) . . . S(Ω_(j))]∈

^(O×J)  (54)

and G denotes the matrix with the weights on its diagonal, i.e.

G:=diag(g ₁ ,g _(J)).  (55)

From eq. (53) it can be seen that a necessary condition for eq. (52) tohold is that the number J of sampling points fulfils J≧O. Collecting thevalues of the time domain amplitude density at the J sampling pointsinto the vector

w(t):=(D(t,Ω ₁), . . . ,D(t,Ω _(J)))^(T),  (56)

and defining the vector of scaled time domain Ambisonics coefficients by

c(t):=({tilde over (c)} ₀ ⁰(t),{tilde over (c)} ₁ ⁻¹(t),{tilde over (c)}₁ ⁰(t),{tilde over (c)} ₁ ¹(t),{tilde over (c)} ₂ ⁻²(t),{tilde over (c)}₀ ⁰(t))^(T),  (57)

both vectors are related through the SH functions expansion (29). Thisrelation provides the following system of linear equations:

w(t)=Ψ^(H) c(t).  (58)

Using the introduced vector notation, the computation of the scaled timedomain Ambisonics coefficients from the values of the time domainamplitude density function samples can be written as

c(t)≈ΨGw(t).  (59)

Given a fixed Ambisonics order N, it is often not possible to compute anumber J≧0 of sampling points Ω_(j) and the corresponding weights suchthat the sampling condition eq. (52) holds. However, if the samplingpoints are chosen such that the sampling condition is well approximated,then the rank of the mode matrix Ψ is 0 and its condition number low. Inthis case, the pseudo-inverse

Ψ⁺:=(ΨΨ^(H))⁻¹ΨΨ⁺  (60)

of the mode matrix Ψ exists and a reasonable approximation of the scaledtime domain Ambisonics coefficient vector c(t) from the vector of thetime domain amplitude density function samples is given by

c(t)≈Ψ⁺ w(t).  (61)

If J=0 and the rank of the mode matrix is 0, then its pseudo-inversecoincides with its inverse since

Ψ⁺=(ΨΨ^(H))⁻¹Ψ=Ψ^(−H)Ψ⁻¹Ψ=Ψ^(−H)  (62)

If additionally the sampling condition eq. (52) is satisfied, then

Ψ^(−H)=ΨG  (63)

holds and both approximations (59) and (61) are equivalent and exact.

Vector w(t) can be interpreted as a vector of spatial time domainsignals. The transform from the HOA domain to the spatial domain can beperformed e.g. by using eq. (58). This kind of transform is termed‘Spherical Harmonic Transform’ (SHT) in this application and is usedwhen the ambient HOA component of reduced order is transformed to thespatial domain. It is implicitly assumed that the spatial samplingpoints Ω_(j) for the SHT approximately satisfy the sampling condition ineq. (52) with

$g_{j} \approx \frac{4\pi}{o}$

for j=1, . . . , J and that J=0.

Under these assumptions the SHT matrix satisfies

$\Psi^{H} \approx {\frac{4\pi}{o}{\Psi^{- 1}.}}$

In case the absolute scaling for the SHT not being important, theconstant

$\frac{4\pi}{o}$

can be neglected.

Compression

This invention is related to the compression of a given HOA signalrepresentation. As mentioned above, the HOA representation is decomposedinto a predefined number of dominant directional signals in the timedomain and an ambient component in HOA domain, followed by compressionof the HOA representation of the ambient component by reducing itsorder. This operation exploits the assumption, which is supported bylistening tests, that the ambient sound field component can berepresented with sufficient accuracy by a HOA representation with a loworder. The extraction of the dominant directional signals ensures that,following that compression and a corresponding decompression, a highspatial resolution is retained.

After the decomposition, the ambient HOA component of reduced order istransformed to the spatial domain, and is perceptually coded togetherwith the directional signals as described in section Exemplaryembodiments of patent application EP 10306472.1.

The compression processing includes two successive steps, which aredepicted in FIG. 2. The exact definitions of the individual signals aredescribed in below section Details of the compression.

In the first step or stage shown in FIG. 2 a, in a dominant directionestimator 22 dominant directions are estimated and a decomposition ofthe Ambisonics signal C(l) into a directional and a residual or ambientcomponent is performed, where l denotes the frame index. The directionalcomponent is calculated in a directional signal computation step orstage 23, whereby the Ambisonics representation is converted to timedomain signals represented by a set of D conventional directionalsignals X(l) with corresponding directions Ω _(DOM)(l). The residualambient component is calculated in an ambient HOA component computationstep or stage 24, and is represented by HOA domain coefficientsC_(A)(l).

In the second step shown in FIG. 2 b, a perceptual coding of thedirectional signals X(l) and the ambient HOA component C_(A)(l) iscarried out as follows:

-   -   The conventional time domain directional signals X(l) can be        individually compressed in a perceptual coder 27 using any known        perceptual compression technique.    -   The compression of the ambient HOA domain component C_(A)(l) is        carried out in two sub steps or stages.    -   The first substep or stage 25 performs a reduction of the        original Ambisonics order N to N_(RED), e.g. N_(RED)=2,        resulting in the ambient HOA component C_(A,RED)(l). Here, the        assumption is exploited that the ambient sound field component        can be represented with sufficient accuracy by HOA with a low        order. The second substep or stage 26 is based on a compression        described in patent application EP 10306472.1. The        O_(RED):=(N_(RED)+1)² HOA signals C_(A,RED)(l) of the ambient        sound field component, which were computed at substep/stage 25,        are transformed into O_(RED) equivalent signals W_(A,RED)(l) in        the spatial domain by applying a Spherical Harmonic Transform,        resulting in conventional time domain signals which can be input        to a bank of parallel perceptual codecs 27. Any known perceptual        coding or compression technique can be applied. The encoded        directional signals {hacek over (X)}(l) and the order-reduced        encoded spatial domain signals {circle around (W)}_(A,RED)(l)        are output and can be transmitted or stored.

Advantageously, the perceptual compression of all time domain signalsX(l) and W_(A,RED)(l) can be performed jointly in a perceptual coder 27in order to improve the overall coding efficiency by exploiting thepotentially remaining inter-channel correlations.

Decompression

The decompression processing for a received or replayed signal isdepicted in FIG. 3. Like the compression processing, it includes twosuccessive steps.

In the first step or stage shown in FIG. 3 a, in a perceptual decoding31 a perceptual decoding or decompression of the encoded directionalsignals {hacek over (X)}(l) and of the order-reduced encoded spatialdomain signals {hacek over (W)}_(A,RED)(l) is carried out, where{circumflex over (X)}(l) is the represents component and {hacek over(W)}_(A,RED)(l) represents the ambient HOA component. The perceptuallydecoded or decompressed spatial domain signals Ŵ_(A,RED)(l) aretransformed in an inverse spherical harmonic transformer 32 to an HOAdomain representation Ĉ_(A,RED)(l) of order N_(RED) via an inverseSpherical Harmonics transform. Thereafter, in an order extension step orstage 33 an appropriate HOA representation Ĉ_(A)(l) of order N isestimated from Ĉ_(A,RED)(l) by order extension.

In the second step or stage shown in FIG. 3 b, the total HOArepresentation Ĉ(l) is re-composed in an HOA signal assembler 34 fromthe directional signals {circumflex over (X)}(l) and the correspondingdirection information {circumflex over (Ω)}_(DOM)(l) as well as from theoriginal-order ambient HOA component Ĉ_(A)(l).

Achievable Data Rate Reduction

A problem solved by the invention is the considerable reduction of thedata rate as compared to existing compression methods for HOArepresentations. In the following the achievable compression ratecompared to the non-compressed HOA representation is discussed. Thecompression rate results from the comparison of the data rate requiredfor the transmission of a non-compressed HOA signal C(l) of order N withthe data rate required for the transmission of a compressed signalrepresentation consisting of D perceptually coded directional signalsX(l) with corresponding directions Ω _(DOM)(l) and N_(RED) perceptuallycoded spatial domain signals W_(A,RED)(l) representing the ambient HOAcomponent.

For the transmission of the non-compressed HOA signal C(l) a data rateof O·f_(S)·N_(b) is required. On the contrary, the transmission of Dperceptually coded directional signals X(l) requires a data rate ofD·f_(b,COD), where f_(b,COD) denotes the bit rate of the perceptuallycoded signals. Similarly, the transmission of the N_(RED) perceptuallycoded spatial domain signals W_(A,RED)(l) signals requires a bit rate ofO_(RED)·f_(b,COD).

The directions Ω _(DOM)(l) are assumed to be computed based on a muchlower rate compared to the sampling rate f_(S), i.e. they are assumed tobe fixed for the duration of a signal frame consisting of B samples,e.g. B=1200 for a sampling rate of f_(S)=48 kHz, and the correspondingdata rate share can be neglected for the computation of the total datarate of the compressed HOA signal.

Therefore, the transmission of the compressed representation requires adata rate of approximately (D+O_(RED))·f_(b,COD). Consequently, thecompression rate r_(COMPR) is

$\begin{matrix}{r_{COMPR} \approx {\frac{O \cdot f_{s} \cdot N_{b}}{\left( {D + O_{RED}} \right) \cdot f_{b,{COD}}}.}} & (64)\end{matrix}$

For example, the compression of an HOA representation of order N=4employing a sampling rate f_(S)=48 kHz and N_(b)=16 bits per sample to arepresentation with D=3 dominant directions using a reduced HOA orderN_(RED)=2 and a bit rate of

$64\frac{kbits}{s}$

will result in a compression rate of r_(COMPR)≈25. The transmission ofthe compressed representation requires a data rate of approximately

$768{\frac{kbits}{s}.}$

Reduced Probability for Occurrence of Coding Noise Unmasking

As explained in the Background section, the perceptual compression ofspatial domain signals described in patent application EP 10306472.1suffers from remaining cross correlations between the signals, which maylead to unmasking of perceptual coding noise. According to theinvention, the dominant directional signals are first extracted from theHOA sound field representation before being perceptually coded. Thismeans that, when composing the HOA representation, after perceptualdecoding the coding noise has exactly the same spatial directivity asthe directional signals. In particular, the contributions of the codingnoise as well as that of the directional signal to any arbitrarydirection is deterministically described by the spatial dispersionfunction explained in section Spatial resolution with finite order. Inother words, at any time instant the HOA coefficients vectorrepresenting the coding noise is exactly a multiple of the HOAcoefficients vector representing the directional signal. Thus, anarbitrarily weighted sum of the noisy HOA coefficients will not lead toany unmasking of the perceptual coding noise.

Further, the ambient component of reduced order is processed exactly asproposed in EP 10306472.1, but because per definition the spatial domainsignals of the ambient component have a rather low correlation betweeneach other, the probability for perceptual noise unmasking is low.

Improved Direction Estimation

The inventive direction estimation is dependent on the directional powerdistribution of the energetically dominant HOA component. Thedirectional power distribution is computed from the rank-reducedcorrelation matrix of the HOA representation, which is obtained byeigenvalue decomposition of the correlation matrix of the HOArepresentation. Compared to the direction estimation used in theabove-mentioned “Plane-wave decomposition . . . ” article, it offers theadvantage of being more precise, since focusing on the energeticallydominant HOA component instead of using the complete HOA representationfor the direction estimation reduces the spatial blurring of thedirectional power distribution.

Compared to the direction estimation proposed in the above-mentioned“The Application of Compressive Sampling to the Analysis and Synthesisof Spatial Sound Fields” and “Time Domain Reconstruction of SpatialSound Fields Using Compressed Sensing” articles, it offers the advantageof being more robust. The reason is that the decomposition of the HOArepresentation into the directional and ambient component can hardlyever be accomplished perfectly, so that there remains a small ambientcomponent amount in the directional component. Then, compressivesampling methods like in these two articles fail to provide reasonabledirection estimates due to their high sensitivity to the presence ofambient signals.

Advantageously, the inventive direction estimation does not suffer fromthis problem.

Alternative Applications of the HOA Representation Decomposition

The described decomposition of the HOA representation into a number ofdirectional signals with related direction information and an ambientcomponent in HOA domain can be used for a signal-adaptive DirAC-likerendering of the HOA representation according to that proposed in theabove-mentioned Pulkki article “Spatial Sound Reproduction withDirectional Audio Coding”.

Each HOA component can be rendered differently because the physicalcharacteristics of the two components are different. For example, thedirectional signals can be rendered to the loudspeakers using signalpanning techniques like Vector Based Amplitude Panning (VBAP), cf. V.Pulkki, “Virtual Sound Source Positioning Using Vector Base AmplitudePanning”, Journal of Audio Eng. Society, vol. 45, no. 6, pp. 456-466,1997. The ambient HOA component can be rendered using known standard HOArendering techniques.

Such rendering is not restricted to Ambisonics representation of order‘1’ and can thus be seen as an extension of the DirAC-like rendering toHOA representations of order N>1.

The estimation of several directions from an HOA signal representationcan be used for any related kind of sound field analysis.

The following sections describe in more detail the signal processingsteps.

Compression Definition of Input Format

As input, the scaled time domain HOA coefficients {tilde over (c)}_(n)^(m)(t) defined in eq. (26) are assumed to be sampled at a rate

$f_{S} = {\frac{1}{T_{S}}.}$

A vector c(j) is defined to be composed of all coefficients belonging tothe sampling time t=jT_(S), j∈

, according to

c(j):=[{tilde over (c)} ₀ ⁰(jT _(S)),{tilde over (c)} ₁ ⁻¹(jT_(S)),{tilde over (c)} ₁ ⁰(jT _(S)),{tilde over (c)} ₁ ¹(jT _(S)),{tildeover (c)} ₂ ⁻²(jT _(S)),{tilde over (c)} _(N) ^(N)(jT _(S))]^(T)∈

^(O).  (65)

Framing

The incoming vectors c(j) of scaled HOA coefficients are framed inframing step or stage 21 into non-overlapping frames of length Baccording to

C(l):=[c(lB+1)c(lB+2) . . . c(lB+B)]∈

^(O×B).  (66)

Assuming a sampling rate of f_(s)=48 kHz, an appropriate frame length isB=1200 samples corresponding to a frame duration of 25 ms.

Estimation of Dominant Directions

For the estimation of the dominant directions the following correlationmatrix

$\begin{matrix}{{B(l)}:={{\frac{1}{LB}{\sum\limits_{{l\; \prime} = 0}^{L - 1}{{C\left( {l - l^{\prime}} \right)}{C^{T}\left( {l - l^{\prime}} \right)}}}} \in {{\mathbb{R}}^{O \times O}.}}} & (67)\end{matrix}$

is computed. The summation over the current frame l and L−1 previousframes indicates that the directional analysis is based on longoverlapping groups of frames with L·B samples, i.e. for each currentframe the content of adjacent frames is taken into consideration. Thiscontributes to the stability of the directional analysis for tworeasons: longer frames are resulting in a greater number ofobservations, and the direction estimates are smoothed due tooverlapping frames.

Assuming f_(S)=48 kHz and B=1200, a reasonable value for L is 4corresponding to an overall frame duration of 100 ms.

Next, an eigenvalue decomposition of the correlation matrix B(l) isdetermined according to

B(l)=V(l)Λ(l)V ^(T)(l),  (68)

wherein matrix V(l) is composed of the eigenvectors v_(i)(l), 1≦i≦0, as

V(l):=[v ₁(l)v ₂(l) . . . v _(O)(l)]∈

O×O  (69)

and matrix Λ(l) is a diagonal matrix with the corresponding eigenvaluesλ_(i)(l), 1≦i≦0, on its diagonal:

Λ(l):=diag(λ₁(l),λ₂(l), . . . ,λ₀(l))∈

^(0×0).  (70)

It is assumed that the eigenvalues are indexed in a non-ascending order,i.e.

λ₁(l)≧λ₂(l)≧ . . . ≧λ₀(l).  (71)

Thereafter, the index set {1, . . . , {tilde over (j)}(l)} of dominanteigenvalues is computed. One possibility to manage this is defining adesired minimal broadband directional-to-ambient power ratio DAR_(MIN)and then determining {tilde over (j)}(l) such that

$\begin{matrix}{{{10{\log_{10}\left( \frac{\lambda_{i}(l)}{\lambda_{1}(l)} \right)}} \geq {{- {DAR}_{{MI}\; N}}{\forall{i \leq {{\overset{\sim}{}(l)}\mspace{14mu} {and}}}}}}\text{}{{{10{\log_{10}\left( \frac{\lambda_{i}(l)}{\lambda_{1}(l)} \right)}} > {{- {DAR}_{{MI}\; N}}\mspace{14mu} {for}\mspace{14mu} i}} = {{\overset{\sim}{}(l)} + 1.}}} & (72)\end{matrix}$

A reasonable choice for DAR_(MIN) is 15 dB. The number of dominanteigenvalues is further constrained to be not greater than D in order toconcentrate on no more than D dominant directions. This is accomplishedby replacing the index set {1, . . . , {tilde over (J)}(l)} by {1, . . ., J(l)}, where

J(l):=max({tilde over (j)}(l),D).  (73)

Next, the j(l)-rank approximation of B(l) is obtained by

B _(J)(l):=V _(J)(l)Λ_(J)(l)V _(J) ^(T)(l), where  (74)

V _(J)(l):=[v ₁(l)v ₂(l) . . . v _(J(l))(l)]∈

^(0×J(l)),  (75)

Λ_(J)(l):=diag(λ₁(l)),λ₂(l), . . . ,λ_(J(l))(l))∈

^(J(l)×j(l)).  (76)

This matrix should contain the contributions of the dominant directionalcomponents to B(l).

Thereafter, the vector

$\begin{matrix}{{\sigma^{2}(l)}:={{{diag}\left( {\Xi^{T}{B_{}(l)}\Xi} \right)} \in {\mathbb{R}}^{Q}}} & {(77)} \\{= \left( {{S_{1}^{T}{B_{}(l)}S_{1}},\ldots \mspace{14mu},{S_{Q}^{T}{B_{}(l)}S_{Q}}} \right)^{T}} & {{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}(78)}\end{matrix}$

is computed, where E denotes a mode matrix with respect to a high numberof nearly equally distributed test directions Ω_(q):=(θ_(q),φ_(q)),1≦q≦Q, where θ_(q)∈[0,π] denotes the inclination angle θ∈[0,π] measuredfrom the polar axis z and φ_(q)∈[−π,π] denotes the azimuth anglemeasured in the x=y plane from the x axis.

Mode matrix Ξ is defined by

Ξ=[S ₁ S ₂ . . . S _(Q)]∈

^(0×Q)  (79)

with

S _(q) :=[S ₀ ⁰(Ω_(q)),S ₁ ⁻¹(Ω_(q)),S ₁ ⁰(Ω_(q)),S ₁ ⁻¹(Ω_(q)),S ₂⁻²(Ω_(q)), . . . ,S _(N) ^(N)(Ω_(q))]^(T)  (80)

for 1≦q≦Q.

The σ_(q) ²(l) elements of σ²(l) are approximations of the powers ofplane waves, corresponding to dominant directional signals, impingingfrom the directions Ω_(q). The theoretical explanation for that isprovided in the below section Explanation of direction search algorithm.

From σ²(l) a number {tilde over (D)}(l) of dominant directionsΩ_(CURRDOM,d)(l) 1≦{tilde over (d)}≦{tilde over (D)}(l), for thedetermination of the directional signal components is computed. Thenumber of dominant directions is thereby constrained to fulfil {tildeover (D)}(l)≦D in order to assure a constant data rate. However, if avariable data rate is allowed, the number of dominant directions can beadapted to the current sound scene.

One possibility to compute the {tilde over (D)}(l) dominant directionsis to set the first dominant direction to that with the maximum power,i.e. Ω_(CURRDOM,1)(l)=Ω_(q) ₁ with q₁:=argmax_(q∈M) ₁ σ_(q) ²(l) andM₁:={1, 2, . . . , Q}. Assuming that the power maximum is created by adominant directional signal, and considering the fact that using a HOArepresentation of finite order N results in a spatial dispersion ofdirectional signals (cf. the above-mentioned “Plane-wave decomposition .. . ” article), it can be concluded that in the directionalneighbourhood of Ω_(CURRDOM,1)(l) there should occur power componentsbelonging to the same directional signal. Since the spatial signaldispersion can be expressed by the function v_(N)(Θ_(q,q) ₁ ) (see eq.(38)), where Θ_(q,q) ₁ :=∠(Ω_(q),Ω_(q) ₁ ) denotes the angle betweenΩ_(q) and Ω_(CURRDOM,1)(l), the power belonging to the directionalsignal declines according to v_(N) ²(Θ_(q,q) ₁ ). Therefore it isreasonable to exclude all directions Ω_(q) in the directionalneighbourhood of Ω_(q) ₁ with Θ_(q,1)≦Θ_(MIN) for the search of furtherdominant directions. The distance Θ_(MIN) can be chosen as the firstzero of v_(N)(x), which is approximately given by π/N for N≧4. Thesecond dominant direction is then set to that with the maximum power inthe remaining directions Ω_(q)∈

₂ with

₂:={q∈

₁|Θ_(q,1)>Θ_(MIN)} The remaining dominant directions are determined inan analogous way.

The number {tilde over (D)}(l) of dominant directions can be determinedby regarding the powers σ_(q) _(d) ²(l) assigned to the individualdominant directions Ω_(q) _(d) and searching for the case where theratio σ_(q) ₁ ²(l)/σ_(q) _(d) ²(l) exceeds the value of a desired directto ambient power ratio DAR_(MIN). This means that {tilde over (D)}(l)satisfies

$\begin{matrix}{{10{\log_{10}\left( \frac{\sigma_{q_{1}}^{2}(l)}{\sigma_{q_{\overset{\sim}{D}{(l)}}}^{2}(l)} \right)}} \leq {\quad{{DAR}_{{MI}\; N}{\left\lbrack {{{{10{\log_{10}\left( \frac{\sigma_{q_{1}}^{2}(l)}{\sigma_{q_{{\overset{\sim}{D}{(l)}} + 1}}^{2}(l)} \right)}} > {DAR}_{{MI}\; N}}{\overset{\sim}{D}(l)}} = D} \right\rbrack.}}}} & (81)\end{matrix}$

The overall processing for the computation of all dominant directions iscan be carried out as follows:

Algorithm 1 Search of dominant directions given power distribution onthe sphere   PowerFlag = true {tilde over (d)} = 1

₁ = {1, 2, . . . , Q} repeat  $q_{\overset{\sim}{d}} = {\underset{q \in \mathcal{M}_{\overset{\sim}{d}}}{argmax}\; {\sigma_{q}^{2}(l)}}$ ${{if}\mspace{14mu}\left\lbrack {{\overset{\sim}{d} > 1}{{10\; {\log_{10}\left( \frac{\sigma_{q_{1}}^{2}(l)}{\sigma_{q_{\overset{\sim}{d}}}^{2}(l)} \right)}} > {DAR}_{MIN}}} \right\rbrack}\mspace{14mu} {then}$  PowerFlag = false  else   

  (l) =  

  

 = {q ε

 | ∠ (Ω_(q), 

) > θ_(MIN)}   {tilde over (d)} = {tilde over (d)} + 1  end if${until}\mspace{14mu}\left\lbrack {{{\overset{\sim}{d} > D}{PowerFlag}} = {false}} \right\rbrack${tilde over (D)} (l) = {tilde over (d)} − 1

Next, the directions Ω_(CURRDOM,){tilde over (d)}(l), 1≦{tilde over(d)}≦{tilde over (D)}(l), obtained in the current frame are smoothedwith the directions from the previous frames, resulting in smootheddirections Ω _(DOM,d)(l), 1≦d≦D. This operation can be subdivided intotwo successive parts:

-   (a) The current dominant directions Ω_(CURRDOM,){tilde over (d)}(l),    1≦{tilde over (d)}≦{tilde over (D)}(l), are assigned to the smoothed    directions Ω _(DOM,d)(l−1), 1≦d≦D, from the previous frame. The    assignment function ƒ_(A,l):{1, . . . , {tilde over (D)}(l)}→{1, . .    . , D} is determined such that the sum of angles between assigned    directions

Σ_({tilde over (d)}=1) ^({tilde over (D)}(l)∠(Ω)_(CURRDOM,{tilde over (d)})(l), Ω _(DOM,ƒ) _(A,l)_(({tilde over (d)}))(l−1))  (82)

is minimised. Such an assignment problem can be solved using thewell-known Hungarian algorithm, cf. H. W. Kuhn, “The Hungarian methodfor the assignment problem”, Naval research logistics quarterly 2, no.1-2, pp. 83-97, 1955. The angles between current directionsΩ_(CURRDOM,){tilde over (d)}(l) and inactive directions (see below forexplanation of the term ‘inactive direction’) from the previous frame Ω_(DOM,d)(l−1) are set to 2Θ_(MIN). This operation has the effect thatcurrent directions Ω_(CURRDOM,){tilde over (d)}(l) which are closer than2Θ_(MIN) to previously active directions Ω _(DOM,d)(l−1), are attemptedto be assigned to them. If the distance exceeds 2Θ_(MIN), thecorresponding current direction is assumed to belong to a new signal,which means that it is favoured to be assigned to a previously inactivedirection Ω _(DOM,d)(l−1). Remark: when allowing a greater latency ofthe overall compression algorithm, the assignment of successivedirection estimates may be performed more robust. For example, abruptdirection changes may be better identified without mixing them up withoutliers resulting from estimation errors.

-   (b) The smoothed directions Ω _(DOM,d)(l−1), 1≦d≦D are computed    using the assignment from step (a). The smoothing is based on    spherical geometry rather than Euclidean geometry. For each of the    current dominant directions Ω_(CURRDOM,){tilde over (d)}(l),    1≦{tilde over (d)}≦{tilde over (D)}(l), the smoothing is performed    along the minor arc of the great circle crossing the two points on    the sphere, which are specified by the directions Ω_(CURRDOM,){tilde    over (d)}(l) and Ω _(DOM,d)(l−1). Explicitly, the azimuth and    inclination angles are smoothed independently by computing the    exponentially-weighted moving average with a smoothing factor α_(Ω).    For the inclination angle this results in the following smoothing    operation:

θ _(DOM,ƒ) _(A,l) _(({tilde over (d)}))(l)=(1−α_(Ω))· θ _(DOM,ƒ) _(A,l)_(({tilde over (d)}))(l−1)+α_(Ω)·θ_(DOM,{tilde over (d)})(l), 1≦{tildeover (d)}≦{tilde over (D)}(l).  (83)

-   -   For the azimuth angle the smoothing has to be modified to        achieve a correct smoothing at the transition from π−∈ to −π,        ∈>0, and the transition in the opposite direction. This can be        taken into consideration by first computing the difference angle        modulo 2π as

Δ_(φ,[0,2π[,{tilde over (d)})(l):=[φ_(DOM,{tilde over (d)})(l)− φ_(DOM,ƒ) _(A,l) _(({tilde over (d)}))(l−1)] mod 2π,  (84)

-   -   which is converted to the interval [−π,π[ by

$\begin{matrix}{{\Delta_{\varphi,{\lbrack{{- \pi},{\pi\lbrack{,\overset{\sim}{d}}}}}}(l)}:=\left( {\begin{matrix}{\Delta_{\varphi,{\lbrack{0,{2{\pi\lbrack{,\overset{\sim}{d}}}}}}}(l)} & {{{for}\mspace{14mu} {\Delta_{\varphi,{\lbrack{0,{2{\pi\lbrack{,\overset{\sim}{d}}}}}}}(l)}} < \pi} \\{{\Delta_{\varphi,{\lbrack{0,{2{\pi\lbrack{,\overset{\sim}{d}}}}}}}(l)} - {2\pi}} & {{{for}\mspace{14mu} {\Delta_{\varphi,{\lbrack{0,{2{\pi\lbrack{,\overset{\sim}{d}}}}}}}(l)}} \geq \pi}\end{matrix}.} \right.} & (85)\end{matrix}$

-   -   The smoothed dominant azimuth angle modulo 2π is determined as

φ _(DOM,[0,2π[,{tilde over (d)})(l):=[ φ_(DOM,{tilde over (d)})(l−1)+α_(Ω)·Δ_(φ,[−π,π[,{tilde over (d)})(l)] mod2π  (86)

-   -   and is finally converted to lie within the interval [−π,π[ by

$\begin{matrix}{{{\overset{\_}{\varphi}}_{{DOM},\overset{\sim}{d}}(l)} = \left( {\begin{matrix}{{\overset{\_}{\varphi}}_{{DOM},{\lbrack{0,{2{\pi\lbrack{,\overset{\sim}{d}}}}}}}(l)} & {{{for}\mspace{14mu} {{\overset{\_}{\varphi}}_{{DOM},{\lbrack{0,{2{\pi\lbrack{,\overset{\sim}{d}}}}}}}(l)}} < \pi} \\{{{\overset{\_}{\varphi}}_{{DOM},{\lbrack{0,{2{\pi\lbrack{,\overset{\sim}{d}}}}}}}(l)} - {2\pi}} & {{{for}\mspace{14mu} {{\overset{\_}{\varphi}}_{{DOM},{\lbrack{0,{2{\pi\lbrack{,\overset{\sim}{d}}}}}}}(l)}} \geq \pi}\end{matrix}.} \right.} & (87)\end{matrix}$

In case {tilde over (D)}(l)<D, there are directions Ω _(DOM,d)(l−1) fromthe previous frame that do not get an assigned current dominantdirection. The corresponding index set is denoted by

_(NA)(l):={1, . . . ,D}\{ƒ _(A,l)({tilde over (d)})|1≦{tilde over(d)}≦D}.  (88)

The respective directions are copied from the last frame, i.e.

Ω _(DOM,d)(l)= Ω _(DOM,d)(l−1) for d∈

_(NA)(l).  (89)

Directions which are not assigned for a predefined number L_(IA) offrames are termed inactive.

Thereafter the index set of active directions denoted by

_(ACT)(l) is computed. Its cardinality is denoted by D_(ACT)(l):=|

_(ACT)(l)|.

Then all smoothed directions are concatenated into a single directionmatrix as

Ω _(DOM)(l):=[ Ω _(DOM,1)(l) Ω _(DOM,2)(l) . . . Ω _(DOM,D)(l)].  (90)

Computation of Direction Signals

The computation of the direction signals is based on mode matching. Inparticular, a search is made for those directional signals whose HOArepresentation results in the best approximation of the given HOAsignal. Because the changes of the directions between successive framescan lead to a discontinuity of the directional signals, estimates of thedirectional signals for overlapping frames can be computed, followed bysmoothing the results of successive overlapping frames using anappropriate window function. The smoothing, however, introduces alatency of a single frame.

The detailed estimation of the directional signals is explained in thefollowing:

First, the mode matrix based on the smoothed active directions iscomputed according to

Ξ_(ACT)(l):=[S _(DOM,d) _(ACT,1) (l)S _(DOM,d) _(ACT,2) (l) . . . S_(DOM,d) _(ACT,DACT) _((l))(l)]∈

^(0×D) ^(ACT(l))   (91)

with

[S ₀ ⁰( Ω _(DOM,d)(l)),S ₁ ⁻¹( Ω _(DOM,d)(l)),S ₁ ⁰( Ω _(DOM,d)(l)), . .. ,S _(N) ^(N)( Ω _(DOM,d)(l))]^(T)∈

⁰,  (92)

wherein d_(ACT,j), 1≦j≦D_(ACT)(l) denotes the indices of the activedirections.

Next, a matrix X_(INST)(l) is computed that contains the non-smoothedestimates of all directional signals for the (l×1)-th and l-th frame:

X _(INST)(l):=[x _(INST)(l,1)x _(INST)(l,2) . . . X _(INST)(l,2B)]∈

^(D×2B)  (93)

with

x _(INST)(l,j):=[x _(INST,1)(l,j),x _(INST,2)(l,j), . . . ,x_(INST,D)(l,j)^(T)∈

^(D),1≦j≦2B.  (94)

This is accomplished in two steps. In the first step, the directionalsignal samples in the rows corresponding to inactive directions are setto zero, i.e.

x _(INST,d)(l,j)=0, ∀1≦j≦2B, ifd∉

_(ACT)(l).  (95)

In the second step, the directional signal samples corresponding toactive directions are obtained by first arranging them in a matrixaccording to

$\begin{matrix}{{X_{{INST},{ACT}}(l)} = {\quad{\begin{bmatrix}{x_{{INST},d_{{ACT},1}}\left( {l,1} \right)} & \; & {x_{{INST},d_{{ACT},1}}\left( {l,{2B}} \right)} \\\vdots & {\ddots\vdots} & \; \\{x_{{INST},d_{{ACT},D_{{ACT}^{(l)}}}}\left( {l,1} \right)} & \; & {{x_{{INST},d_{{ACT},D_{{ACT}^{(l)}}}}\left( {l,{2B}} \right)}.}\end{bmatrix}.}}} & (96)\end{matrix}$

This matrix is then computed such as to minimise the Euclidcan norm ofthe error

Ξ_(ACT)(l)X _(INST,ACT)(l)−[C(l−1)C(l)].  (97)

The solution is given by

X _(INST,ACT)(l)=[Ξ_(ACT) ^(T)(l)Ξ_(ACT)(l)]⁻¹Ξ_(ACT)^(T)(l)[C(l−1)C(l)].  (98)

The estimates of the directional signals x_(INST,d)(l,j), 1≦d≦D, arewindowed by an appropriate window function w(j):

x _(INST,WIN,d)(l,j):=x _(INST,d)(l,j)·w(j), 1≦j≦2B.  (99)

An example for the window function is given by the periodic Hammingwindow defined by

$\begin{matrix}{{w(j)}:=\left( {\begin{matrix}{K_{w}\left\lbrack {0.54 - {0.46\; {\cos \left( \frac{2\pi \; j}{{2B} + 1} \right)}}} \right\rbrack} & {{{for}\mspace{14mu} 1} \leq j \leq {2B}} \\0 & {else}\end{matrix},} \right.} & (100)\end{matrix}$

where K_(w) denotes a scaling factor which is determined such that thesum of the shifted windows equals ‘1’. The smoothed directional signalsfor the (l−1)-th frame are computed by the appropriate superposition ofwindowed non-smoothed estimates according to

x _(d)((l−1)B+j)=x _(INST,WIN,d)(l−1,B+j)+x _(INST,WIN,d)(l,j).  (101)

The samples of all smoothed directional signals for the (l−1)-th frameare arranged in matrix X(l−1) as

X(l−1):=[x((l−1)B+1)x((l−1)B+2) . . . x((l−1)B+B)]∈

^(D×B)  (102)

with

x(j)=[X ₁(j),x ₂(j), . . . ,x _(D)(j)]^(T)∈

^(D).  (103)

Computation of Ambient HOA Component

The ambient HOA component C_(A)(l−1) is obtained by subtracting thetotal directional HOA component C_(DIR)(l−1) from the total HOArepresentation C(l−1) according to

C _(A)(l−1):=C(l−1)−C _(DIR)(l−1)∈

^(O×B),  (104)

where C_(DIR)(l−1) is determined by

$\begin{matrix}{{C_{DIR}\left( {l - 1} \right)} = {{\Xi_{DOM}\left( {l - 1} \right)}{\quad{{\begin{bmatrix}{x_{{INST},{WIN},1}\left( {{l - 1},{B + 1}} \right)} & \; & {x_{{INST},{WIN},1}\left( {{l - 1},{2B}} \right)} \\\vdots & \ddots & \vdots \\{x_{{INST},{WIN},D}\left( {{l - 1},{B + 1}} \right)} & \; & {x_{{INST},{WIN},D}\left( {{l - 1},{2B}} \right)}\end{bmatrix} + {{\Xi_{DOM}(l)}\begin{bmatrix}{x_{{INST},{WIN},1}\left( {l,1} \right)} & \; & {x_{{INST},{WIN},1}\left( {l,B} \right)} \\\vdots & \ddots & \vdots \\{x_{{INST},{WIN},D}\left( {l,1} \right)} & \; & {x_{{INST},{WIN},D}\left( {l,B} \right)}\end{bmatrix}}},}}}} & (105)\end{matrix}$

and where Ξ_(DOM)(l) denotes the mode matrix based on all smootheddirections defined by

Ξ_(DOM)(l):=[S _(DOM,1)(l)S _(DOM,2)(l) . . . S _(DOM,D)(l)]∈

^(O×D).  (106)

Because the computation of the total directional HOA component is alsobased on a spatial smoothing of overlapping successive instantaneoustotal directional HOA components, the ambient HOA component is alsoobtained with a latency of a single frame.

Order Reduction for Ambient HOA Component

Expressing C_(A)(l−1) through its components as

$\begin{matrix}{{C_{A}\left( {l - 1} \right)} = {\quad{\begin{bmatrix}{c_{0,A}^{0}\left( {{\left( {l - 1} \right)B} + 1} \right)} & \; & {c_{0,A}^{0}\left( {{\left( {l - 1} \right)B} + B} \right)} \\\vdots & \ddots & \vdots \\{c_{N,A}^{N}\left( {{\left( {l - 1} \right)B} + 1} \right)} & \; & {c_{N,A}^{N}\left( {{\left( {l - 1} \right)B} + B} \right)}\end{bmatrix},}}} & (107)\end{matrix}$

the order reduction is accomplished by dropping all HOA coefficientsc_(n,A) ^(m)(j) with n>N_(RED):

$\begin{matrix}{{C_{A,{RED}}\left( {l - 1} \right)} = {\quad{\begin{bmatrix}{c_{0,A}^{0}\left( {{\left( {l - 1} \right)B} + 1} \right)} & \; & {c_{0,A}^{0}\left( {{\left( {l - 1} \right)B} + B} \right)} \\\vdots & \ddots & \vdots \\{c_{N_{RED},A}^{N_{RED}}\left( {{\left( {l - 1} \right)B} + 1} \right)} & \; & {c_{N_{RED},A}^{N_{RED}}\left( {{\left( {l - 1} \right)B} + B} \right)}\end{bmatrix} \in {{\mathbb{R}}^{O_{RED} \times B}.}}}} & (108)\end{matrix}$

Spherical Harmonic Transform for Ambient HOA Component

The Spherical Harmonic Transform is performed by the multiplication ofthe ambient HOA component of reduced order C_(A,RED)(l) with the inverseof the mode matrix

Ξ_(A):=[S_(A,1)S_(A,2) . . . S_(A,O) _(RED) ]∈

^(O) ^(RED) ^(×O) ^(RED)   (109)

with

S_(A,d):=[S₀ ⁰(Ω_(A,d)),S₁ ⁻¹(Ω_(A,d)),S₁ ⁰(Ω_(A,d)), . . . ,S_(N)_(RED) ^(N) ^(RED) (Ω_(A,d))]^(T)∈

^(O) ^(RED) ,  (110)

based on O_(RED) being uniformly distributed directions

Ω_(A,d),1≦d≦O _(RED) :W _(A,RED)(l)=(Ξ_(A))⁻¹ C _(A,RED)(l).  (111)

Decompression Inverse Spherical Harmonic Transform

The perceptually decompressed spatial domain signals Ŵ_(A,RED)(l) aretransformed to a HOA domain representation Ĉ_(A,RED)(l) of order N_(RED)via an Inverse Spherical Harmonics Transform by

Ĉ _(A,RED)(l)=Ξ_(A) Ŵ _(A,RED)(l).  (112)

Order Extension

The Ambisonics order of the HOA representation Ĉ_(A,RED)(l) is extendedto N by appending zeros according to

$\begin{matrix}{{{{\hat{C}}_{A}(l)}:={\begin{bmatrix}{{\hat{C}}_{A,{RED}}(l)} \\0_{{({O - O_{RED}})} \times B}\end{bmatrix} \in {\mathbb{R}}^{O \times B}}},} & (113)\end{matrix}$

where 0_(m×n) denotes a zero matrix with m rows and n columns.

HOA Coefficients Composition

The final decompressed HOA coefficients are additively composed of thedirectional and the ambient HOA component according to

{circumflex over (C)}(l−1):=Ĉ _(A)(l−1)+Ĉ _(DIR)(l−1).  (114)

At this stage, once again a latency of a single frame is introduced toallow the directional HOA component to be computed based on spatialsmoothing. By doing this, potential undesired discontinuities in thedirectional component of the sound field resulting from the changes ofthe directions between successive frames are avoided.

To compute the smoothed directional HOA component, two successive framescontaining the estimates of all individual directional signals areconcatenated into a single long frame as

{circumflex over (X)} _(INST)(l):=[{circumflex over (X)}(l−1){circumflexover (X)}(l)]∈

^(D×2B).  (115)

Each of the individual signal excerpts contained in this long frame aremultiplied by a window function, e.g. like that of eq. (100). Whenexpressing the long frame {circumflex over (X)}_(INST)(l) through itscomponents by

$\begin{matrix}{{{{\hat{X}}_{INST}(l)} = \begin{bmatrix}{{\hat{x}}_{{INST},1}\left( {l,1} \right)} & \; & {{\hat{x}}_{{INST},1}\left( {l,{2B}} \right)} \\\vdots & \ddots & \vdots \\{{\hat{x}}_{{INST},D}\left( {l,1} \right)} & \; & {{\hat{x}}_{{INST},D}\left( {l,{2B}} \right)}\end{bmatrix}},} & (116)\end{matrix}$

the windowing operation can be formulated as computing the windowedsignal excerpts {circumflex over (x)}_(INST,WIN,d)(l,j), 1≦d≦D, by

{circumflex over (x)} _(INST,WIN,d)(l,j)={circumflex over (x)}_(INST,d)(l,j)·w(j), 1≦j≦2B, 1≦d≦D.  (117)

Finally, the total directional HOA component C_(DIR)(l−1) is obtained byencoding all the windowed directional signal excerpts into theappropriate directions and superposing them in an overlapped fashion:

$\begin{matrix}{{{\hat{C}}_{DIR}\left( {l - 1} \right)} = {{\Xi_{DOM}\left( {l - 1} \right)}{\quad{\begin{bmatrix}{{\hat{x}}_{{INST},{WIN},1}\left( {{l - 1},{B + 1}} \right)} & \; & {{\hat{x}}_{{INST},{WIN},1}\left( {{l - 1},{2B}} \right)} \\\vdots & \ddots & \vdots \\{{\hat{x}}_{{INST},{WIN},D}\left( {{l - 1},{B + 1}} \right)} & \; & {{\hat{x}}_{{INST},{WIN},D}\left( {{l - 1},{2B}} \right)}\end{bmatrix} + {{{\Xi_{DOM}(l)}\begin{bmatrix}{{\hat{x}}_{{INST},{WIN},1}\left( {l,1,} \right)} & \; & {{\hat{x}}_{{INST},{WIN},1}\left( {l,B} \right)} \\\vdots & \ddots & \vdots \\{{\hat{x}}_{{INST},{WIN},D}\left( {l,1,} \right)} & \; & {{\hat{x}}_{{INST},{WIN},D}\left( {l,B} \right)}\end{bmatrix}}.}}}}} & (118)\end{matrix}$

Explanation of Direction Search Algorithm

In the following, the motivation is explained behind the directionsearch processing described in section Estimation of dominantdirections. It is based on some assumptions which are defined first.

Assumptions

The HOA coefficients vector c(j), which is in general related to thetime domain amplitude density function d(j,Ω) through

c(j)=f _(S) ₂ d(j,Ω)S(Ω)dΩ,  (119)

is assumed to obey the following model:

c(j)=Σ_(i=1) ^(I) x _(i)(j)S(Ω_(x) _(i) (l))+c _(A)(j) forlB+1≦j≦(l+1)B.  (120)

This model states that the HOA coefficients vector c(j) is on one handcreated by I dominant directional source signals x_(i)(j), 1≦i≦I,arriving from the directions Ω_(x) _(i) (l) in the l-th frame. Inparticular, the directions are assumed to be fixed for the duration of asingle frame. The number of dominant source signals I is assumed to bedistinctly smaller than the total number of HOA coefficients O. Further,the frame length B is assumed to be distinctly greater than O. On theother hand, the vector c(j) consists of a residual component c_(A)(j),which can be regarded as representing the ideally isotropic ambientsound field.

The individual HOA coefficient vector components are assumed to have thefollowing properties:

-   -   The dominant source signals are assumed to be zero mean, i.e.

Σ_(j=lB+1) ^((l+1)B) x _(i)(j)≈0 ∀1≦i≦I,  (121)

-   -   and are assumed to be uncorrelated with each other, i.e.

$\begin{matrix}{{\frac{1}{B}{\sum\limits_{j = {{lB} + 1}}^{{({l + 1})}B}{{x_{i}(j)}x_{i}}}},{(j) \approx \delta_{i - i}},{{{\overset{\_}{\sigma}}_{x_{i}}^{2}(l)}\mspace{14mu} {\forall{1 \leq i}}},{i^{\prime} \leq I}} & (122)\end{matrix}$

-   -   with σ _(x) _(i) ²(l) denoting the average power of the i-th        signal for the l-th frame.    -   The dominant source signals are assumed to be uncorrelated with        the ambient component of HOA coefficient vector, i.e.

$\begin{matrix}{{\frac{1}{B}{\sum\limits_{j = {{lB} + 1}}^{{({l + 1})}B}{{x_{i}(j)}{c_{A}(j)}}}} \approx {0\mspace{14mu} {\forall{1 \leq i \leq {I.}}}}} & (123)\end{matrix}$

-   -   The ambient HOA component vector is assumed to be zero mean and        is assumed to have the covariance matrix

$\begin{matrix}{{\sum\limits_{A}(l)}:={\frac{1}{B}{\sum\limits_{j = {{lB} + 1}}^{{({l + 1})}B}{{c_{A}(j)}{{c_{A}^{T}(j)}.}}}}} & (124)\end{matrix}$

-   -   The direct-to-ambient power ratio DAR(l) of each frame l, which        is here defined by

$\begin{matrix}{{{{DAR}(l)}:={10\; {\log_{10}\left\lbrack \frac{\max\limits_{1 \leq i \leq I}{{\overset{\_}{\sigma}}_{x_{i}}^{2}(l)}}{{{\sum_{A}(l)}}^{2}} \right\rbrack}}},} & (125)\end{matrix}$

-   -   is assumed to be greater than a predefined desired value        DAR_(MIN), i.e.

DAR(l)≧DAR _(MIN).  (126)

Explanation of Direction Search

For the explanation the case is considered where the correlation matrixB(l) (see eq. (67)) is computed based only on the samples of the l-thframe without considering the samples of the L−1 previous frames. Thisoperation corresponds to setting L=1. Consequently, the correlationmatrix can be expressed by

$\begin{matrix}\begin{matrix}{{B(l)} = {\frac{1}{B}{C(l)}{C^{T}(l)}}} \\{= {\frac{1}{B}{\sum\limits_{j = {{lB} + 1}}^{{({l + 1})}B}{{c(j)}{{c^{T}(j)}.(128)}}}}}\end{matrix} & (127)\end{matrix}$

By substituting the model assumption in eq. (120) into eq. (128) and byusing equations (122) and (123) and the definition in eq. (124), thecorrelation matrix B(l) can be approximated as

$\begin{matrix}\begin{matrix}{{B(l)} = {\frac{1}{B}{\sum\limits_{j = {{lB} + 1}}^{{({l + 1})}B}\left\lbrack {{\sum\limits_{i = 1}^{I}{{x_{i}(j)}{S\left( {\Omega_{x_{i}}(l)} \right)}}} + {c_{A}(j)}} \right\rbrack}}} \\{\left\lbrack {{\sum\limits_{i^{\prime} = 1}^{I}x_{i}},{{(j){S\left( {\Omega_{x_{i}},(l)} \right)}} + {c_{A}(j)}}} \right\rbrack^{T}} \\{{= {\sum\limits_{i = 1}^{I}{\sum\limits_{i^{\prime} = 1}^{I}{{S\left( {\Omega_{x_{i}},(l)} \right)}{S^{T}\left( {\Omega_{x_{i},}(l)} \right)}\frac{1}{B}{\sum\limits_{j = {{lB} + 1}}^{{({l + 1})}B}{{x_{i}(j)}x_{i}}}}}}},{(j) +}} \\{{{\sum\limits_{i = 1}^{I}{S\left( {\Omega_{x_{i}},(l)} \right)\frac{1}{B}{\sum\limits_{j = {{lB} + 1}}^{{({l + 1})}B}{{x_{i}(j)}{c_{A}^{T}(j)}}}}} +}} \\{{{\sum\limits_{i^{\prime} = 1}^{I}{\frac{1}{B}{\sum\limits_{j = {{lB} + 1}}^{{({l + 1})}B}x_{i}}}},{{(j)\; {c_{A}(j)}{S^{T}\left( {\Omega_{x_{i}},(l)} \right)}} +}}} \\{{\frac{1}{B}{\sum\limits_{j = {{lB} + 1}}^{{({l + 1})}B}{{c_{A}(j)}{c_{A}^{T}(j)}(130)}}}} \\{\approx {{\sum\limits_{i = 1}^{I}{{{\overset{\_}{\sigma}}_{x_{i}}^{2}(l)}{S\left( {\Omega_{x_{i}}(l)} \right)}{S^{T}\left( {\Omega_{x_{i}}(l)} \right)}}} + {\sum\limits_{A}{(l).(131)}}}}\end{matrix} & (129)\end{matrix}$

From eq. (131) it can be seen that B(l) approximately consists of twoadditive components attributable to the directional and to the ambientHOA component. Its J(l)-rank approximation B_(J)(l) provides anapproximation of the directional HOA component, i.e.

B _(J)(l)≈Σ_(i=1) ^(I) σ _(x) _(i) ²(l)S(Ω_(x) _(i) (l))S ^(T)(Ω_(x)_(i) (l)),  (132)

which follows from the eq. (126) on the directional-to-ambient powerratio.

However, it should be stressed that some portion of Σ_(A)(l) willinevitably leak into B_(J)(l), since Σ_(A)(l) has full rank in generaland thus, the subspaces spanned by the columns of the matrices Σ_(i=1)^(I) σ _(x) _(i) ²(l)S(Ω_(x) _(i) (l))S(Ω_(x) _(i) (l)) and Σ_(A)(l) arenot orthogonal to each other. With eq. (132) the vector σ²(l) in eq.(77), which is used for the search of the dominant directions, can beexpressed by

$\begin{matrix}\begin{matrix}{{\sigma^{2}(l)} = {{diag}\left( {\Xi^{T}{B_{}(l)}\Xi} \right)}} \\{= {{{diag}\left( \begin{bmatrix}S^{T} & {\left( \Omega_{1} \right){B_{}(l)}{S\left( \Omega_{1} \right)}} & \; & S^{T} & {\left( \Omega_{1} \right){B_{}(l)}{S\left( \Omega_{Q} \right)}} \\\vdots & \; & \ddots & \vdots & \; \\S^{T} & {\left( \Omega_{Q} \right){B_{}(l)}{S\left( \Omega_{1} \right)}} & \; & S^{T} & {\left( \Omega_{Q} \right){B_{}(l)}{S\left( \Omega_{Q} \right)}}\end{bmatrix} \right)}(134)}} \\{\approx {{diag}\left( \begin{bmatrix}{\sum\limits_{i = 1}^{I}{{{\overset{\_}{\sigma}}_{x_{i}}^{2}(l)}{v_{N}^{2}\left( {\angle \left( {\Omega_{1},\Omega_{x_{i}}} \right)} \right)}}} & \; & {\sum\limits_{i = 1}^{I}{{{\overset{\_}{\sigma}}_{x_{i}}^{2}(l)}{v_{N}\left( {\angle \left( {\Omega_{1},\Omega_{x_{i}}} \right)} \right)}{v_{n}\left( {\angle \left( {\Omega_{x_{i}},\Omega_{Q}} \right)} \right)}}} \\\vdots & \ddots & \vdots \\{\sum\limits_{i = 1}^{I}{{{\overset{\_}{\sigma}}_{x_{i}}^{2}(l)}{v_{N}\left( {\angle \left( {\Omega_{\Omega},\Omega_{x_{i}}} \right)} \right)}{v_{n}\left( {\angle \left( {\Omega_{x_{i}},\Omega_{1}} \right)} \right)}}} & \; & {\sum\limits_{i = 1}^{I}{{{\overset{\_}{\sigma}}_{x_{i}}^{2}(l)}{v_{N}^{2}\left( {\angle \left( {\Omega_{Q},\Omega_{x_{i}}} \right)} \right)}}}\end{bmatrix} \right)}} \\{= {\begin{bmatrix}{\sum\limits_{i = 1}^{I}{{{\overset{\_}{\sigma}}_{x_{i}}^{2}(l)}{v_{N}^{2}\left( {\angle \left( {\Omega_{1},\Omega_{x_{i}}} \right)} \right)}}} & \ldots & {\sum\limits_{i = 1}^{I}{{{\overset{\_}{\sigma}}_{x_{i}}^{2}(l)}{v_{N}^{2}\left( {\angle \left( {\Omega_{\Omega},\Omega_{x_{i}}} \right)} \right)}}}\end{bmatrix}^{T}.(136)}}\end{matrix} & (133)\end{matrix}$

In eq. (135) the following property of Spherical Harmonics shown in eq.(47) was used:

S ^(T)(Ω_(q))S(Ω_(q′))=v _(N)(∠(Ω_(q),Ω_(q′))).  (137)

Eq. (136) shows that the σ_(q) ²(l) components of σ²(l) areapproximations of the powers of signals arriving from the testdirections Ω_(q), 1≦q≦Q.

1-9. (canceled)
 10. A method for compressing a Higher Order AmbisonicsHOA signal representation, said method comprising the steps: estimatingdominant directions; decomposing or decoding the HOA signalrepresentation into a number of dominant directional signals in timedomain and related direction information, and a residual ambientcomponent in HOA domain, wherein said residual ambient componentrepresents the difference between said HOA signal representation and arepresentation of said dominant directional signals; compressing saidresidual ambient component by reducing its order as compared to itsoriginal order; transforming said residual ambient HOA component ofreduced order to the spatial domain; perceptually encoding said dominantdirectional signals and said transformed residual ambient HOA component.11. A method for decompressing a Higher Order Ambisonics HOA signalrepresentation that was compressed by the steps: estimating dominantdirections; decomposing or decoding the HOA signal representation into anumber of dominant directional signals in time domain and relateddirection information, and a residual ambient component in HOA domain,wherein said residual ambient component represents the differencebetween said HOA signal representation and a representation of saiddominant directional signals; compressing said residual ambientcomponent by reducing its order as compared to its original order;transforming said residual ambient HOA component of reduced order to thespatial domain; perceptually encoding said dominant directional signalsand said transformed residual ambient HOA component, said methodcomprising the steps: perceptually decoding said perceptually encodeddominant directional signals and said perceptually encoded transformedresidual ambient HOA component; inverse transforming said perceptuallydecoded transformed residual ambient HOA component so as to get an HOAdomain representation; performing an order extension of said inversetransformed residual ambient HOA component so as to establish anoriginal-order ambient HOA component; composing said perceptuallydecoded dominant directional signals, said direction information andsaid original-order extended ambient HOA component so as to get an HOAsignal representation.
 12. The method according to claim 10, whereinincoming vectors of HOA coefficients are framed into non-overlappingframes, and wherein a frame duration can be 25 ms.
 13. The methodaccording to claim 10, wherein said dominant directions estimating isdependent on long overlapping groups of frames, such that for eachcurrent frame the content of adjacent frames is taken intoconsideration.
 14. The method according to claim 10, wherein saiddominant directional signals and said transformed ambient HOA componentare jointly perceptually compressed.
 15. The method according to claim10, wherein said decomposing of the HOA signal representation into anumber of dominant directional signals in time domain with relateddirection information and a residual ambient component in HOA domain isused for a signal-adaptive DirAC-like rendering of the HOArepresentation, wherein DirAC means Directional Audio Coding accordingto Pulkki.
 16. The method according to claim 10, wherein said dominantdirection estimation is dependent on a directional power distribution ofthe energetically dominant HOA components
 17. An apparatus forcompressing a Higher Order Ambisonics HOA signal representation, saidapparatus comprising: means adapted to estimate dominant directions;means adapted to decompose or decode the HOA signal representation intoa number of dominant directional signals in time domain and relateddirection information, and a residual ambient component in HOA domain,wherein said residual ambient component represents the differencebetween said HOA signal representation and a representation of saiddominant directional signals; means adapted to compress said residualambient component by reducing its order as compared to its originalorder; means adapted to transform said residual ambient HOA component ofreduced order to the spatial domain; means adapted to perceptuallyencode said dominant directional signals and said transformed residualambient HOA component.
 18. An apparatus for decompressing a Higher OrderAmbisonics HOA signal representation that was compressed by the steps:estimating dominant directions; decomposing or decoding the HOA signalrepresentation into a number of dominant directional signals in timedomain and related direction information, and a residual ambientcomponent in HOA domain, wherein said residual ambient componentrepresents the difference between said HOA signal representation and arepresentation of said dominant directional signals; compressing saidresidual ambient component by reducing its order as compared to itsoriginal order; transforming said residual ambient HOA component ofreduced order to the spatial domain; perceptually encoding said dominantdirectional signals and said transformed residual ambient HOA component,said apparatus comprising: means adapted to perceptually decode saidperceptually encoded dominant directional signals and said perceptuallyencoded transformed residual ambient HOA component; means adapted toinverse transform said perceptually decoded transformed residual ambientHOA component so as to get an HOA domain representation; means adaptedto perform an order extension of said inverse transformed residualambient HOA component so as to establish an original-order ambient HOAcomponent; means adapted to compose said perceptually decoded dominantdirectional signals, said direction information and said original-orderextended ambient HOA component so as to get an HOA signalrepresentation.
 19. The apparatus according to claim 17, whereinincoming vectors of HOA coefficients are framed into non-overlappingframes, and wherein a frame duration can be: 25 ms.
 20. The apparatusaccording to claim 17, wherein said dominant directions estimating isdependent on long overlapping groups of frames, such that for eachcurrent frame the content of adjacent frames is taken intoconsideration.
 21. The apparatus according to claim 17, wherein saiddominant directional signals and said transformed ambient HOA componentare jointly perceptually compressed.
 22. The apparatus according toclaim 17, wherein said decomposing of the HOA signal representation intoa number of dominant directional signals in time domain with relateddirection information and a residual ambient component in HOA domain isused for a signal-adaptive DirAC-like rendering of the HOArepresentation, wherein DirAC means Directional Audio Coding accordingto Pulkki.
 23. The apparatus according to claim 17, wherein saiddominant direction estimation is dependent on a directional powerdistribution of the energetically dominant HOA components.
 24. Anapparatus for compressing a Higher Order Ambisonics HOA signalrepresentation, wherein said apparatus is configured to: estimatedominant directions; decompose or decode the HOA signal representationinto a number of dominant directional signals in time domain and relateddirection information, and a residual ambient component in HOA domain,wherein said residual ambient component represents the differencebetween said HOA signal representation and a representation of saiddominant directional signals; compress said residual ambient componentby reducing its order as compared to its original order; transform saidresidual ambient HOA component of reduced order to the spatial domain;perceptually encode said dominant directional signals and saidtransformed residual ambient HOA component.
 25. An apparatus fordecompressing a Higher Order Ambisonics HOA signal representation thatwas compressed by the steps: estimating dominant directions; decomposingor decoding the HOA signal representation into a number of dominantdirectional signals in time domain and related direction information,and a residual ambient component in HOA domain, wherein said residualambient component represents the difference between said HOA signalrepresentation and a representation of said dominant directionalsignals; compressing said residual ambient component by reducing itsorder as compared to its original order; transforming said residualambient HOA component of reduced order to the spatial domain;perceptually encoding said dominant directional signals and saidtransformed residual ambient HOA component, wherein said decompressingapparatus is configured to: perceptually decode said perceptuallyencoded dominant directional signals and said perceptually encodedtransformed residual ambient HOA component; inverse transform saidperceptually decoded transformed residual ambient HOA component so as toget an HOA domain representation; perform an order extension of saidinverse transformed residual ambient HOA component so as to establish anoriginal-order ambient HOA component; compose said perceptually decodeddominant directional signals, said direction information and saidoriginal-order extended ambient HOA component so as to get an HOA signalrepresentation.
 26. The apparatus according to claim 24, whereinincoming vectors of HOA coefficients are framed into non-overlappingframes, and wherein a frame duration can be 25 ms.
 27. The apparatusaccording to claim 24, wherein said dominant directions estimating isdependent on long overlapping groups of frames, such that for eachcurrent frame the content of adjacent frames is taken intoconsideration.
 28. The apparatus according to claim 24, wherein saiddominant directional signals and said transformed ambient HOA componentare jointly perceptually compressed.
 29. The apparatus according toclaim 24, wherein said decomposing of the HOA signal representation intoa number of dominant directional signals in time domain with relateddirection information and a residual ambient component in HOA domain isused for a signal-adaptive DirAC-like rendering of the HOArepresentation, wherein DirAC means Directional Audio Coding accordingto Pulkki.
 30. The apparatus according to claim 24, wherein saiddominant direction estimation is dependent on a directional powerdistribution of the energetically dominant HOA components.
 31. An HOAsignal that is compressed according to the method of claim 10.